Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot][sad] enhance test to cover different port types and ope… #3853

Merged
merged 3 commits into from
Aug 16, 2021

Conversation

stepanblyschak
Copy link
Contributor

@stepanblyschak stepanblyschak commented Jul 22, 2021

…r state in sad scenarios

Signed-off-by: Stepan Blyschak [email protected]

Description of PR

Summary: Enhance warm reboot sad path test cases.
Fixes #3683

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911

Approach

What is the motivation for this PR?

The motivation is to fill the test gap to test different ports admin and operationally down before warm-reboot and check their status after warm-reboot. Sanity will make sure the ports changed are back up.

How did you do it?

  • For port down cases implemented a special logic for port selection based on port physical properties.
  • For port down cases implemented also port operational state change.
  • As of second enhancement in the list there was a neccessary refactoring made, few sad operations were taken out from ptf script and implemented in pytest. Old sad path wasn't changed to support ansible compatiblity. This is partial refactoring only for enabling us to implement first two items from this list. Other sad operations need to be taken to pytest as well in the future.

How did you verify/test it?

Running warm_reboot_multi_sad.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Your Name and others added 2 commits July 21, 2021 10:49
…r state in sad scenarios

Signed-off-by: Stepan Blyschak <[email protected]>
Signed-off-by: Stepan Blyschak <[email protected]>
@stepanblyschak stepanblyschak requested review from sujinmkang and a team as code owners July 22, 2021 09:46
@sujinmkang
Copy link
Contributor

@stepanblyschak Can you please resolve the conflict and update PR? Can we check the port status with selecting based on either the port status before rebooting or the minigraph information?

@stepanblyschak
Copy link
Contributor Author

@stepanblyschak Can you please resolve the conflict and update PR? Can we check the port status with selecting based on either the port status before rebooting or the minigraph information?

@sujinmkang The port selection is based on minigraph information, in the minigraph there are only up ports

@stepanblyschak
Copy link
Contributor Author

/azpw run

@stepanblyschak stepanblyschak reopened this Aug 2, 2021
@sujinmkang sujinmkang requested a review from prgeor August 3, 2021 15:13
@roy-sror
Copy link
Contributor

roy-sror commented Aug 5, 2021

/azpw run

1 similar comment
@stepanblyschak
Copy link
Contributor Author

/azpw run

@qiluo-msft
Copy link
Contributor

Is it possible to separate the part of "cover different port types" into a standalone PR? that's a bug fix, not an enhancement.

@stepanblyschak
Copy link
Contributor Author

Is it possible to separate the part of "cover different port types" into a standalone PR? that's a bug fix, not an enhancement.

@qiluo-msft This is not a bug fix. It was not designed to "cover different port types" based on the code, unfrotunately there is no test plan document to prove it, so I treat it as an enhancement.

@qiluo-msft qiluo-msft merged commit 88c6b37 into sonic-net:master Aug 16, 2021
lolyu pushed a commit to lolyu/sonic-mgmt that referenced this pull request Aug 18, 2021
…r state in sad scenarios (sonic-net#3853)

### Description of PR

Summary: Enhance warm reboot sad path test cases.
Fixes sonic-net#3683

### Type of change
- [x] Test case(new/improvement)


### Approach
#### What is the motivation for this PR?

The motivation is to fill the test gap to test different ports admin and operationally down before warm-reboot and check their status after warm-reboot. Sanity will make sure the ports changed are back up.

#### How did you do it?

- For port down cases implemented a special logic for port selection based on port physical properties.
- For port down cases implemented also port operational state change.
- As of second enhancement in the list there was a neccessary refactoring made, few sad operations were taken out from ptf script and implemented in pytest. Old sad path wasn't changed to support ansible compatiblity. This is partial refactoring only for enabling us to implement first two items from this list. Other sad operations need to be taken to pytest as well in the future.

#### How did you verify/test it?

Running warm_reboot_multi_sad.
@vaibhavhd
Copy link
Contributor

This commit has caused several LAG flaps in the warm reboot sad cases across different platforms.
I tested by reverting this commit locally, and the cases started passing. I am trying to find out what in this commit caused the regression, but suggest reverting this PR until there is a fix.

Also, looks like revert function is called twice now (once from ptf script, and once from pytest in the cases that are moved to Pytest)
https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/files/ptftests/advanced-reboot.py#L1050

@stepanblyschak
Copy link
Contributor Author

@vaibhavhd Hi, could you please share the command line to reproduce? I wonder wether you have expirienced a bug in SONiC teamd warm-reboot that caused exactly this issue - sonic-net/sonic-buildimage#8227. In this case, could you please share the dump after failing test case to verify this?

Cases that are moved to pytest replace cases in ptf script. In the code you mentioned self.sad_revert() is called when self.sad_oper and self.sad_handle are set, which is not the case for pytest cases. Please check this part of the code - https://github.com/Azure/sonic-mgmt/pull/3853/files#diff-bfbd05e640c01905f096016ec8fc695dba2364fc44f5e600fd087841d173cbaeR527

@vaibhavhd
Copy link
Contributor

I am trying to evaluate that if it is a SONiC image issue. I just run Pytest test_warm_reboot_sad. The interesting thing is that the failures are consistently seen after this sonic-mgmt PR is merged. Also, it fails for the cases that are moved to pytest. I will update more when I have some logs to share.

@vaibhavhd
Copy link
Contributor

@stepanblyschak, the failures that I was earlier seeing were infact due to issue 8227, and were fixed by your PR. When the fix is included in 202012 image, I did not hit the same issues anymore. Thank you for finding and fixing this.

vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
…r state in sad scenarios (sonic-net#3853)

### Description of PR

Summary: Enhance warm reboot sad path test cases.
Fixes sonic-net#3683

### Type of change
- [x] Test case(new/improvement)


### Approach
#### What is the motivation for this PR?

The motivation is to fill the test gap to test different ports admin and operationally down before warm-reboot and check their status after warm-reboot. Sanity will make sure the ports changed are back up.

#### How did you do it?

- For port down cases implemented a special logic for port selection based on port physical properties.
- For port down cases implemented also port operational state change.
- As of second enhancement in the list there was a neccessary refactoring made, few sad operations were taken out from ptf script and implemented in pytest. Old sad path wasn't changed to support ansible compatiblity. This is partial refactoring only for enabling us to implement first two items from this list. Other sad operations need to be taken to pytest as well in the future.

#### How did you verify/test it?

Running warm_reboot_multi_sad.
vaibhavhd added a commit that referenced this pull request Nov 28, 2021
…terfaces (#4758)

Fix advance-reboot SAD cases in testcase test_warm_reboot_multi_sad.
The cause for failure was when incorrect participating member interfaces in a LAG are selected.
This issue supposedly started after #3853
As part of PR 3853, some of the SAD case handling was moved out of ptf-tests. Due to this, the port selection for some cases is not done on ptf scripts.
Specifically, presently the traffic is sent through even the LAGs which are brought down. This leads to part of the downstream traffic always being dropped - ultimately leading to warm-up failure.
AntonHryshchuk pushed a commit to AntonHryshchuk/sonic-mgmt that referenced this pull request Jan 4, 2022
…terfaces (sonic-net#4758)

Fix advance-reboot SAD cases in testcase test_warm_reboot_multi_sad.
The cause for failure was when incorrect participating member interfaces in a LAG are selected.
This issue supposedly started after sonic-net#3853
As part of PR 3853, some of the SAD case handling was moved out of ptf-tests. Due to this, the port selection for some cases is not done on ptf scripts.
Specifically, presently the traffic is sent through even the LAGs which are brought down. This leads to part of the downstream traffic always being dropped - ultimately leading to warm-up failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Warmboot] Request test to verify port status after warmboot
5 participants